課程資訊
課程名稱
資訊檢索與文字探勘導論
Introduction to Information Retrieval and Text Mining 
開課學期
103-1 
授課對象
管理學院  資訊管理學系  
授課教師
陳建錦 
課號
IM5030 
課程識別碼
725 U3410 
班次
 
學分
全/半年
半年 
必/選修
選修 
上課時間
星期二2,3,4(9:10~12:10) 
上課地點
管二305 
備註
本課程中文授課,使用英文教科書。
限學士班三年級以上
總人數上限:25人 
Ceiba 課程網頁
http://ceiba.ntu.edu.tw/1031IRTM 
課程簡介影片
 
核心能力關聯
核心能力與課程規劃關聯圖
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

This course will cover the concepts and algorithms of information retrieval and text mining. Theoretical topics, including term extraction, term weighting, vector space model, binary independence model, language model, IR system evaluations, naive bayes classification, Rocchio classification, kNN, k-means, HAC, PageRank, and HITS, will be presented in this course. Meanwhile, programming assignments and term projects will be given to help students understand the development of an IR system. 

課程目標
The course is aimed at graduate students or senior undergraduate students who are interested in information retrieval and text mining. The first part of the course will cover the basics of information retrieval. Then, research topics, such as text classification and clustering, will be discussed to provide a comprehensive study on information retrieval and text mining. 
課程要求
Programming language, data structure, and probability. 
預期每週課後學習時數
 
Office Hours
 
指定閱讀
待補 
參考書目
Christopher D. Manning and Hinrich Schutze, Foundations of Statistical Natural
language Processing, The MIT Press, 1999.
William B. Frakes and Ricardo Baeza-Yates, Information Retrieval — Data
Structures and Algorithms, Prentice Hall, 1992.
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval,
Addison Wesley, 1999.
 
評量方式
(僅供參考)
   
課程進度
週次
日期
單元主題
第1週
9/16  The Term Vocabulary 
第2週
9/23  The Term Vocabulary<BR>
PAT Tree and Chinese Keyword Extraction<BR>
*** Programming Assignment 1 
第3週
9/30  PAT Tree and Chinese Keyword Extraction<BR>
Scoring, Term Weighting and the Vector Space Model 
第4週
10/07  Scoring, Term Weighting and the Vector Space Model<BR>
Evaluation in Information Retrieval<BR>
*** Programming Assignment 2 
第5週
10/14  Evaluation in Information Retrieval<BR>
Probabilistic Information Retrieval 
第6週
10/21  Probabilistic Information Retrieval 
第7週
10/28  Language Models for Information Retrieval 
第8週
11/04  Language Models for Information Retrieval 
第9週
11/11  Midterm 
第10週
11/18  Link Analysis 
第11週
11/25  Link Analysis<BR>
Text Classification and Naive Bayes 
第12週
12/02  Text Classification and Naive Bayes 
第13週
12/09  Vector Space Classification <BR>
** Programming Assignment 3 
第14週
12/16  Hierarchical Clustering 
第15週
12/23  Hierarchical Clustering <BR>
Flat Clustering 
第16週
12/30  Flat Clustering<BR>
** Programming Assignment 4 
第17週
1/06  Flat Clustering<BR>
Topic Detection and Incremental Clustering 
第18週
1/13  Final